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An electrocardiogram (ECG) machine with a standard 12-lead configuration 
is the primary clinical technique for diagnosing abnormalities in heart 
function. Automated 12-lead ECG machines have the capacity to screen the 
general population and provide second opinions for physicians. However, 
expertise and time are required for manual ECG interpretation. Therefore, 
computer-aided diagnoses are of interest to the medical community. Hence, 
this study aims to build a deep learning (DL) model with an end-to-end 
structure that can categorize 12-lead ECG results into 27 different disorders. 
We use multivariate time-series data to construct a novel end-to-end DL 
model (based on combined convolutional neural networks (CNNs), long 
short-term memory, gated recurrent units, and a deep residual network 
structure) for feature representations and determining spatial relations 
among deep features. In addition, a dataset of 43,101 classified standard 
ECG recordings was collected from six different sources to guarantee the 


model’s ability to generalize and alleviate data divergence. As a result, the 
residual network-based model obtained promising outcomes and an accuracy 
of 0.97. According to the experimental data, it outperforms other methods. 


TheInception-ResNet-v2 model 
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1. INTRODUCTION 

According to the World Health Organization (WHO), worldwide, cardiovascular diseases (CVDs) 
are the leading cause of death, killing 18 million people annually. CVDs include coronary heart disease, 
cerebrovascular disease, rheumatic heart disease, and various heart and blood vessel issues. Heart attacks and 
strokes account for more than four in every five CVD deaths, with one-third occurring before age 70. 
However, treatment costs and sudden cardiac deaths can be reduced significantly with the help of accurate 
and early diagnoses [1]. Electrocardiogram (ECG) machines are widely used for CVD diagnosis due to their 
inexpensiveness, high accuracy, and non-invasive nature. They use 12 electrocardiograph leads to record the 
heart's electrical activity. 

The resulting sequence of electrical signals is recorded from different places on the human body [2]. 
However, skilled doctors are required to investigate and identify abnormal inter- and intra-beat patterns 
picked up by an ECG. Moreover, this process is time-consuming and vulnerable to inter-observer variability 
[3], making an automated ECG signal classification system essential, particularly in non-cardiology 
departments and pre-hospital care settings, where an expert may not always be accessible to interpret ECG 
signals [4]. Many of the earlier methods for automated ECG signal analysis were based on signal 
transformation (such as fourier and wavelet), time-frequency, and frequency domain features [5]-[7]. 
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However, in addition to their complexity, they were unable to capture complex features in ECG data. 
Recently, artificial intelligence (AI) and deep learning (DL) algorithms have been developed to process 
large-scale raw data, avoiding hand-crafted feature extraction methods [8]. Convolutional neural networks 
(CNNs) have achieved notable success in many fields, such as natural language processing [9] and computer 
vision [10]. These successes motivated researchers, as in [11], to propose a multi-layer 1-D recurrent neural 
network (RNN) trained on ECG data from a single lead. A CNN-based method has been suggested to 
increase classification accuracy [12]. In addition, two deep network models using short single-lead signals 
have been proposed for classifying pulse-generating and pulse-less rhythmic activities [13]. Another 
single-lead-based method has been presented that features an ensemble DL model for automating ECG signal 
classification [14]. Indexing and abstracting services depend on the accuracy of the title, extracting from it 
keywords useful in cross-referencing and computer searching. An improperly titled paper may never reach 
the audience for which it was intended, so be specific. 

In this method, ten classifiers are fused, and it produced better results than the single, profound 
classifiers. A multi-stage learning model introduced features such as the frequency and rhythm of beats [15]. 
Inspired by the performances achieved by previous neural network-based methods, which proved their ability 
to capture nonlinearity and complex features, our proposed method is also based on a neural network 
approach. In most past approaches, two to nine heart abnormalities have been classified. Furthermore, most 
of the currently available methods handle ECG data from a single lead, even though 12-lead data are more 
widely used in real-life diagnostic settings. Additionally, most of these works treated diagnosis as a 
multi-class classification problem, while multiple abnormalities often appear in the same ECG record. Some 
proposed methods for analyzing 12-lead ECG data efficiently for 27 heart abnormalities treat this problem as 
a multi-label classification problem, allowing them to consider the presence of more than one abnormality 
simultaneously. The main contributions of this work can be summarized as follows: i) it presents an end-to- 
end model for classifying 27 heart abnormalities using 12-lead ECG signals, ii) we combined techniques to 
enhance feature extraction and classification accuracy, iii) extensive experiments were conducted on 
combined datasets from six different sources to ensure that the model was generalizable, iv) the performance 
comparison proved that the suggested method outperformed the state-of-the-art methods without the need for 
pre-processing or manual feature engineering, and v) the suggested method treats the classification problem 
as a multi-label classification to handle multiple abnormalities in the same ECG record. 

Data to classify heart abnormalities. Chen et al. [16] used a ResNet [17] structure with 1-D 
convolutional layers for feature extraction; their network outputs a 1x512 vector for each lead. By 
concatenating the 12 resulting vectors, a matrix of size 12x512 is obtained, which is then fed into a long- 
short term memory (LSTM) layer and a fully connected layer for final classification. This method classified 
seven heart abnormalities and trained on 7,704 samples. Liu et al. [18] used a biorthogonal wavelet 
transformation to denoise ECG signals. Then the E-ResNet [19] model was used as a baseline model. 
Furthermore, Pan and Tompkin’s algorithm [20] was applied to detect the R-peaks on lead Balogluet al. [21] 
proposed a deep CNN model with ten layers to classify 11 classes of 651 samples each. The suggested model 
was trained for each lead signal individually. Fayyazifaret al. [22] suggested a model of 49 1-D CNN layers, 
one LSTM layer, and 16 skip connections to classify 27 ECG signal types. Leuret al. [8] utilized the 
10-second raw data of 8-lead signals (I, II, and V1—V6), sampled at 500 Hz, as the input for a deep neural 
network with a structure similar to that of the Inception ResNet network by combining convolutional layers 
with skip connections in parallel. 

Gliner et al. [23] proposed two models trained on 41,830 samples. The first model uses the ECG 
signal data, while the second uses ECG plot images. For each model, CNN layers were used with batch 
normalization and dropout for feature extraction, then a fully connected layer and SoftMax activation layer 
were used to classify eight heart abnormalities. Nugent et al. [24] presented a method based on sub-dividing 
the classification space into bi-group classifiers generated through the deployment of neural networks. An 
evidential reasoning framework was combined with this method to accommodate any conflicts among the 
bi-group classifiers. This method was able to classify six ECG signal types using 12-lead data. 

The remainder of this paper is organized as follows: section 2 summarizes significant research on 
ECG classification, while section 3 details the method used in this study. Next, section 4 provides 
experimental details section 5 describes the findings. Finally, in section 6, conclusions are presented. 


2. METHOD AND MATERIALS 
2.1. Dataset 

The PhysioNet/Cinc 2020 challenge [25] dataset is used in this work. It is a publicly available, 
multi-class, and multi-labeled ECG signal dataset containing 43,101 labeled ECG records. Furthermore, the 
dataset was collected from different sources, as shown in Table 1, which makes it perfect for ensuring the 
suggested method’s generalization ability. 
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Table 1. The dataset sources and the number of records for each sub-dataset 


Dataset # Records 
The China Physiological Signal Challenge (CPSC) 2018 [26] 10,330 
The St. Petersburg Institute of Cardiological Technics (INCART) database of 12-lead arrhythmias [27] 74 
The PhysikalischTechnischeBundesanstalt (PTB) [28] and the more recent PTB-XL [29] 22,353 
The Georgia 12-lead ECG Challenge (G12EC) database [25] 10,344 


No distinction is established between the data sources in this experiment and all records are pooled 
into a single repository. Furthermore, the metadata of each record includes the individual's biological 
information and gender. The average age of the participants is 60 years old. Females account for 46.9% of 
the participants, while 53.1% are males. Figure 1 shows an ECG sample from the dataset. Most of the records 
contain more than one diagnosis, and the total number of unique combinations of diagnoses is 1,414. The 
details of the 27 ECG abnormalities are shown in Figure 2, from which we infer a significant dataset class 
imbalance. Sinus rhythm (NSR) is present in more than 20,000 recordings, whereas PVCs were detected in 
less than 200 samples. Such an imbalance could undermine the model performance, as the model is likely to 
learn the diagnostic pattern from categories with many samples while ignoring the minority categories. 
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Figure 1. Random ECG from the dataset 


2.2. Data preparation 

To avoid overfitting and test the model performance efficiently, the data is split into 34,480 samples 
for training and 8,621 samples for testing. Since each record in the dataset consists of 12-lead ECG 
sequences represented as multivariate time series, the correlation between the sequence ordering and the 
leads should be investigated to enhance model performance. While the samples came from different data 
sources, as mentioned earlier, truncating and padding were used to unify the sample lengths, and all the input 
records were fixed at 5,000 data points by padding the signals shorter than 5,000 data points with zeros. The 
input size for the suggested model is a matrix of size 5,000x12. 


2.3. Problem statement 

For 27 classes of 12-lead ECG signals, a pattern classification can be formulated to do the 
classifying. Each signal sample can be presented as a matrix of, so given sequence 
X = {x[0], x[1], x[2], ...x[n]}, a classifier is trained to learn the class as in (1): 


Ê = argmax(f(C = c|X))c = 1,2,...,27 (1) 


where C represents the labels for the record list, x[n] € IR?7** is the input matrix for sample n, and Y is the 
class prediction. 
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Figure 2. Total counts for 27 ECG abnormalities in the original dataset 


3. THE PROPOSED MODELS 

This study adopted five DL models to achieve high classification performance. The prediction time 
is considered, so all suggested models (except the Inception-based model) are designed with as few layers 
and parameters as possible. The implementations of these proposed models are discussed next. 


3.1. LSTM model 

A DL model based on LSTM is implemented to achieve a high-recognition performance on ECG 
signals derived from 12 leads. There are three gates in the LSTM unit: the input, output, and forgetting gates 
(i, y, and f, respectively). In (2)-(4) are used to calculate the outputs of these gates, while c and h in (5) and 
(7) represent the cell state and the hidden state, respectively. 


y, = tanh (W, x; + Ryhy_1 + by) (2) 
ip = o (Wix + Riht-1 + bi + W; © Ce_1) (3) 
fe = o (Wixi + Rehe_1 + bp + Wr © ct-1) (4) 
Ct = it O Ye + ft © Ct-1 (5) 
0, = 0 (Wx, + Roht- + bo +W, OC) (6) 
h, = œ © tanh(c;) (7) 


Where x_t is the input at time t;W, b, and R are the input weight, bias, and recurrent weight matrices of the 
LSTM unit, respectively;o is the sigmoid function (o(x)= 1/(1+e*(-x))); and © represents point-wise 
multiplication. 
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Two LSTM layers with 64 units each and one dense layer with 32 units (using Relu as the activation 
function) are constructed to build a simple optimized model. For a review of this type of model, refer to [30]. 
Bidirectional LSTM is used, so the signal streams forward and backward at each time step, and the outputs of 
both streams are combined to compute the temporal relationship. The dropout technique (with a 30% rate) is 
used during the training process to avoid overfitting. The dropout mechanism turns off 30% of the dense 
layer neurons during training, which regularizes the network. Furthermore, the model is trained just for ten 
epochs to avoid making the model memorize the training data. Figure 3 and Table 2 provide details of the 
layers’ structure and the parameters of each layer to enable the reader to rebuild the proposed model. 


Multi-label 
Classification 


Sigmoid 


Dense (ReLu) 


Figure 3. LSTM model’s layer structure 


Table 2. LSTM model's parameters 


Layer Details # Parameters 
LSTM_1 64 units 19,712 
LSTM_2 64 units 33,024 
Dense_1 32 units, Relu 2,080 
Dropout 0.3 0 
Dense_2 27 units, Sigmoid 759 
Total # parameters 55,575 


3.2. The hybrid CNN-LSTM model 

The CNN-LSTM design is used to handle the nonlinearity and complexity of the 12-lead data. In 
addition, the CNN-LSTM design is used for specific sequence prediction problems with spatial inputs, such 
as those related to video or audio [31]. Using this method, the CNN can learn the relevant features from the 
ECG signals coming from different leads, while the LSTM bridges a long-time lag between the inputs over 
arbitrary time intervals. Practical features can be obtained because the LSTM can depict temporal patterns at 
different frequencies. One 1-D CNN layer (with 64 filters and a kernel size of 8) and one max-pooling layer 
(used for dimensionality reduction and speeding up the training process) are followed by one LSTM layer 
with 128 units and a dense layer with 27 neurons (with a Sigmoid activation function for multi-label class 
prediction). Figure 4 and Table 3 provide the details of the layers’ structure and the parameters of each layer 
to enable the reader to rebuild the proposed model. 


3.3. The hybrid LSTM-GRU model 

Cho et al. [32] introduced the gated recurrent unit (GRU), which can gather associations across 
timescales in an adaptive manner. As with the LSTM, each GRU employs gating units to control the flow of 
information inside the unit. A GRU is a simplified version of the LSTM, as it has only two gates (the reset r_t 
and update z_t gates in (8) and (9)) in its architecture. It offers outstanding performance and solves the 
vanishing gradient problem [33]. The hidden state at time t can be calculated via (10). 


Te = 0(W,x,; + R,hy-1) (8) 
Zt = 0(W,x, + Rzhy-1) (9) 
hi = f(Wx, + hy-1) (10) 
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The update value of the unit activation at time t is z_t, which can be found using (11): 

Z, = 0(W,x, + Rzhy_-1) (11) 
The candidate activation h _t is calculated as in (12): 

h, =tanh(Wx,+R(,® hy-1)) (12) 


where @ is the element-wise multiplication process. 

Combining the LSTM and GRU enables the model to learn the features of the time-space data for the 
ECG signals. The model consists of a 16-unit LSTM layer, a 100-unit GRU layer, a 32-neuron dense layer (with 
a Relu activation function), and a 27-neuron dense layer (with a Sigmoid activation function) for multi-label 
classification. The model architecture is explained in Figure 5, while Table 4 lists the layer parameters. 


3.4. The hybrid CNN-GRU model 

Similar to the hybrid CNN-LSTM model, this next model uses a GRU layer instead of the LSTM. 
The CNN can extract features perfectly, but as a feed-forward neural network, it does not have input memory 
and cannot cycle formed connections in time-based data. The GRU units with their gates can solve this issue 
and the vanishing gradient problem. This model features a 1-D CNN layer with max-pooling for 
dimensionality reduction, followed by the GRU layer. The detailed model architecture is shown in Figure 6 
and Table 5. 


3.5. The Inception-ResNet-v2 model 

In this model, the Inception-ResNet-v2 [17] network is utilized to categorize ECG signals. The 
network's architecture is depicted in Figure 7. It comprises three parts. In the first part, the stem has nine 
convolutional layers, and two max-pooling layers are used to pre-process the original input before it enters 
the Inception-ResNet blocks. The second part is illustrated in Figure 8. Figure 8(a) illustrates the 
Inception-ResNet-A with two 3x3 inception kernels. Figure 8(b) for dimentionalty improvement, and 
Figure 8(c) illustrates the Inception-ResNet-B with an asymmetric filter combination of one 1x7 filter and 
one 7x1 filter in the inception module. Figure 8(e) illustrates the Inception-ResNet-C with a small and 
asymmetric filter combination of one 1x3 filter and one 3x1 filter; 1x1 convolutions are utilized prior to the 
large filters in these blocks. Through asymmetric convolution splitting, the network increases the diversity of 
the filter patterns. In addition, the reductions in Figures A and C shown in Figure 8(d) are performed to 
enhance the dimension, which must balance for the Inception block's dimensionality reduction. The final part 
is the classification layer, which includes the pooling and Sigmoid algorithm. 


Multi-label 
Classification 


Sigmoid 


Figure 4. CNN-LSTM model’s layer structure 


Table 3. CNN-LSTM model's parameters 


Layer Details # Parameters 
CNN_1 64 units, kernel_size=8, strides=1, Relu 6.208 
MaxPooling Pool_size=4 0 
LSTM_1 128 units 98,816 
Dense_1 27 units, Sigmoid 2,967 


Total # parameters 107,991 
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Figure 5. LSTM-GRU model’s layer structure 


Table 4. LSTM-GRU model's parameters 


Layer Details # Parameters 
LSTM_1 16 units 1,856 
GRU 100 units 35,100 
Dense_1 32 units, Relu 3,232 
Dropout 0.3 0 
Dense_2 27 units, Sigmoid 759 
Total # parameters 40,947 
Multi-label 


Classification 
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Figure 6. CNN-GRU model’s layer structure 


Table 5. CNN-GRU model's parameters 


Layer Details # Parameters 
CNN 64 units, kernel_size=8, strides=1, Relu 592 
MaxPolling Pool_size=2 0 
Dense_1 32 units, Relu 544 
GRU 100 units 39,900 
Dense_2 32 units, Relu 3,232 
Dropout 0.3 0 
Dense_3 27 units, Sigmoid 759 
Total # parameters 45,027 
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Figure 7. The architecture of the Inception-ResNet 
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Figure 8. The architecture of the Inception-ResNet (a) inception-ResNet A, (b) reduction A, (c) inception- 
ResNet B, (d) reduction B, and (e) inception-ResNet C 
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4. RESULTS AND DISCUSSION 
4.1. Computing environment 

The training was conducted on a 1.7 GHz Intel Core i5 processor, 8 GB of RAM, a 64-bit 
instruction set on Windows 10 Pro, and a display card with a memory capacity of 2 GB from NVIDIA. 
Python was utilized as the primary programming language, and the TensorFlow package was leveraged to 
build the CNN model. 


4.2. Loss function 

In this work, the binary cross-entropy loss function is used. There are many classes in most samples, 
and it calculates the probability of each class in each sample. In (13) shows how the binary cross-entropy loss 
is calculated: 


Lece = — ZENE (p(xij)-log q(x) + (1 — p(xi;).log (1 = a(x))) (13) 


Where: 

p(X) is the probability of class x in the target 

q(x) is the probability of class x in the prediction 

N is the number of samples, and M is the number of classes 


4.3. Experimental setup 

The parameters of all the models examined in this study are initialized randomly, and for 
optimization, the Adam optimization function is used with a 0.001 initial learning rate. For callbacks, 
reduced learning is used with one patience so that the learning rate will be reduced by a factor of 0.1 when 
there is no improvement in the loss of the validation data after one epoch. Furthermore, to avoid overfitting, 
the early stopping technique is used with patience of 2 to curtail training when there is no improvement in the 
validation data loss after two epochs. Table 6 shows the training time for one epoch for each model. 


Table 6. The training time (in seconds) for one epoch for each model 


Model The training time for one epoch (in sec) 
LSTM 1,241 
CNN-LSTM 153 
GRU-LSTM 9,625 
CNN-GRU 3,814 
Inception 985 


4.4. Evaluation metrics 

For evaluating the performance of these methods, four commonly used performance metrics are 
used in this study, namely, accuracy (14), recall (15), and precision (16), and the area under the curve 
(AUC) [34]. 


Truep+Truen 


Acc = (14) 
Truep+ Falsep+ Truent Falsen 
True 
= p 
Reatt = e (15) 
ruep+ Falsen 
True 
Pre = 2 (16) 


Truep+ Falsep 


Here, True, denotes a true positive, True, denotes a true negative, False, indicates a false 
positive, and False,, indicates a false negative. The AUC is the performance aggregation measure across all e 
possible thresholds for classification. The AUC value is contained in [0,1], and a higher value means better 
model performance. 


5. RESULTS 
As mentioned earlier, the dataset is split into 34,480 samples for training and 8,621 samples for 
testing. The loss function graphs for all the suggested models are shown in Figure 9 (in Appendix). Note the 
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stability of all models (convergence) and the low variability between the training and validation data. 
However, except for the CNN-LSTM model, the loss graphs indicate some instability and high variability, a 
sign of overfitting. Figure 10 (in Appendix) shows the accuracy and precision graphs for the training and 
validation data over the entire dataset; note that the models mostly converge after six to eight epochs. 

The performances of all the proposed models are measured using the test data (8,621 samples that 
the models have not trained on before). Table 7 shows the performance of each model across the entire 
dataset. The results are further analyzed in the discussion section. 


Table 7. The performance metrics for the proposed models across the entire dataset 


Model Accuracy Recall Precision AUC Loss 
LSTM 0.96 0.27 0.71 0.62 0.15 
CNN-LSTM 0.95 0.17 0.58 0.53 0.16 
LSTM-GRU 0.96 0.30 0.80 0.70 0.14 
CNN-GRU 0.96 0.30 0.82 0.67 0.14 
Inception 0.97 0.38 0.84 0.77 0.11 


6. DISCUSSION 

This study is one of the first studies to use DL to diagnose 27 cardiac abnormalities automatically 
based on a large volume of data on 12-lead ECGs. We have shown that a DL technique is capable of 
accurately categorizing 12-lead ECG results. Additionally, the Inception model had a high accuracy of 0.97, 
while the other models had accuracies of 0.96. These findings suggest that a DL method will be useful for 
ECG triage and able to minimize the clinical workload through enhanced prioritizing of ECGs for 
interpretation by a cardiologist. Note that non-cardiologists accurately diagnose 35% to 95% of cardiac 
issues, with considerable variance among physicians and increases in performance with experience [35 ]-[37]. 

The dataset used in this study is challenging because it came from different sources and the classes 
are imbalanced. Furthermore, solving multi-label classification problems is more complicated than solving 
multi-class problems [38]. Nevertheless, Table 7 shows the high precision-low recall obtained results for all 
the models, indicating that when it was difficult to label a sample, the models chose not to predict an 
incorrect label, increasing the false-negative error. According to Figures 11 and 12 (in Appendix), most 
models converged after a few epochs (six to eight epochs in most cases). The Inception model, which uses 
residual networks and CNN, obtained the best performance. 

In addition, the other models achieved close results despite their simple structures. Of the models 
that used CNN for feature extraction (CNN-LSTM and CNN-GRU), the results obtained by the GRU-based 
model were significantly better in terms of precision, recall, and the AUC. On the other hand, the 
GRU-LSTM obtained results close to those of the CNN-GRU model but with a higher training time 
(see Table 6), as the max-pooling layer in the CNN-GRU model provides dimensionality reduction. The 
diversity of the dataset’s sources is an advantage for testing the generalization abilities of the models. 
Furthermore, we trained the models using the PTB-XL dataset to test the proposed models on a single-source 
dataset. The results in Table 8 and Figures 11 and 12 (in Appendix) show significant increases in recall value 
(lower false negative) and the AUC. 


Table 8. The performance metrics for the proposed models on the PTB-XL dataset 


Model Accuracy Recall Precision AUC Loss 
LSTM 0.95 0.48 0.83 0.59 0.16 
CNN-LSTM 0.95 0.48 0.83 0.55 0.16 
LSTM-GRU 0.95 0.49 0.84 0.69 0.15 
CNN-GRU. 0.95 0.49 0.84 0.63 0.15 
Inception 0.97 0.64 0.87 0.84 0.10 


6.1. Comparison with other models 

Many methods have been suggested in the literature for ECG signal classification. Nevertheless, the 
number of classes and leads used differ across these studies, which should be considered when comparing 
their performances. Table 9 compares this paper’s methods and some related approaches (N and MI stand for 
normal class and myocardial infarction, respectively). The comparison shows that our Inception method 
outperformed all other methods, although it classifies 27 multi-label classes and uses a dataset based on 
different sources. Total counts for 27 ECG abnormalities in the original dataset and their corresponding 
abbreviations are explained in Table 10. 
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Table 9. Comparison of the methods with related methods 
Study # Leads # Classes Dataset Method Accuracy 
N 31,722 
[39] 12 2 MI 49,930 CNN 0.935 
[40] 3 2 Ae an Fourier/logistic regression 0.956 
[41] 12 g5 er CNN-LSTM 0.946 
challenge 
[16] 12 7 7,104 samples ResNet-LSTM 0.81 
This study 12 2T: [25] Inception 0.97 
Table 10. ECG abnormalities and their corresponding abbreviations 
ECG abnormality Abbreviation 
1st degree AV block IAVB 
Atrial fibrillation AF 
Atrial flutter AFL 
Bradycardia Brady 
Complete right bundle branch block CRBBB 
Incomplete right bundle branch block IRBBB 
Left anterior fascicular block LAnFB 
Left axis deviation LAD 
Left bundle branch block LBBB 
Low QRS voltages LQRSV 
Nonspecific intraventricular conduction disorder | NSIVCB 
Pacing rhythm PR 
Premature atrial contraction PAC 
Premature ventricular contractions PVC 
Prolonged PR interval LPR 
Prolonged QT interval LQT 
Q wave abnormal QAb 
Right axis deviation RAD 
Right bundle branch block RBBB 
Sinus arrhythmia SA 
Sinus bradycardia SB 
Sinus rhythm NSR 
Sinus tachycardia STach 
Supraventricular premature beats SVPB 
T wave abnormal Tab 
T wave inversion TInv 
Ventricular premature beats VPB 
7. CONCLUSION 


This work presents an end-to-end method for automatic 12-lead ECG classification. Among five 


suggested models, the Inception network-based model achieved the best performance, with an accuracy of 
0.97. The suggested model classifies 27 multi-label abnormalities indicated by 12-lead ECG signals, while 
the related methods classify nine types, at most. Experiments show that our suggested model outperforms the 
other related models on a large dataset based on different sources and a single dataset from the same source. 
Additionally, because all of the datasets utilized are real-world data, we feel that this approach can be 
developed and applied in the medical field or used as a screening tool in conditions/locations where access to 
a 12-lead ECG is limited. We suggest solving the imbalanced dataset problem in future works, which can be 
done by collecting more samples or through various techniques, such as data generation, to improve the 
classification of rare heart abnormalities. 
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Figure 9. Loss per epoch for the models trained on the entire dataset 
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Figure 10. Accuracy per epoch for the models trained on the entire dataset 
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Figure 10. Accuracy per epoch for the models trained on the entire dataset (continue) 
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Figure 11. Loss per epoch for the models trained on the PTB-XL dataset 
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Figure 11. Loss per epoch for the models trained on the PTB-XL dataset (continue) 
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Figure 12. Accuracy per epoch for the models trained on the PTB-XL dataset 
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